Skip to content
This repository has been archived by the owner on Jun 9, 2024. It is now read-only.

First commit for AutoGPT Benchmarks #1

Merged
merged 3 commits into from
Apr 17, 2023

Conversation

dschonholtz
Copy link
Contributor

The highlights here are as follows:

  1. Run AutoGPT in a dockerfile, give it a standard prompt for all tasks in an eval.
  2. Run it in continuous mode based on a config, so it doesn't ask for input.
  3. Map all of that to an OpenAI completionFn so that when we want to run an OpenAI (and later our own) eval, we can just point it to the eval with: EVALS_THREADS=1 EVALS_THREAD_TIMEOUT=600 oaieval auto_gpt_completion_fn <EVAL_NAME> --registry_path $PWD/auto_gpt_benchmarking

This generates results somewhere... Working on finding those.

If the model has used more than 50,000 tokens, it kills the model.
If the model has used less than 50,000 tokens, it returns the output.txt file.
"""
def _clean_up_workspace(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change this to clear everything out of the dir. This is brittle and other files sometimes get created confusing future agents.

@dschonholtz dschonholtz merged commit 22d997d into Significant-Gravitas:master Apr 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant